Information Extraction in Semantic, Highly-Structured, and Semi-Structured Web Sources

Authors: Víctor M. Alonso-Rorís, Juan M. Santos Gago, Roberto Pérez Rodríguez, Carlos Rivas Costa, Miguel A. Gómez Carballa, Luis Anido Rifón

Polibits, Vol. 49, pp. 69-75, 2014.

Abstract: The evolution of the Web from the original proposal made in 1989 can be considered one of the most revolutionary technological changes in centuries. During the past 25 years the Web has evolved from a static version to a fully dynamic and interoperable intelligent ecosystem. The amount of data produced during these few decades is enormous. New applications, developed by individual developers or small companies, can take advantage of both services and data already present on the Web. Data, produced by humans and machines, may be available in different formats and through different access interfaces. This paper analyses three different types of data available on the Web and presents mechanisms for accessing and extracting this information. The authors show several applications that leverage extracted information in two areas of research: recommendations of educational resources beyond content and interactive digital TV applications.

Keywords: Information extraction, web data processing, semantic enrichment, data mining, web scraping

PDF: Information Extraction in Semantic, Highly-Structured, and Semi-Structured Web Sources
PDF: Information Extraction in Semantic, Highly-Structured, and Semi-Structured Web Sources